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Abstract 

A t — a covering array is an m x n matrix, with entries from an 
alphabet of size a, such that for any choice of t rows, and any ordered 
string of t letters of the alphabet, there exists a column such that 
the "values" of the rows in that column match those of the string of 
letters. We use the Lovasz Local Lemma in conjunction with a new 
tiling-based probability model to improve the upper bound on the 
smallest number of columns N = N(m, t,a) of a t — a covering array. 

1 Introduction 

Consider anmxn matrix with entries from the "alphabet" A = {1,2, ... ,a}. 
Let the (i, j) th entry be represented by r^. We say that this matrix is a t — a- 
covering matrix or a t — a-covering array if given any t rows, Pi,P2, ■ ■ ■ ,Pt of 
the matrix, and any vector (v\, t> 2 , . . . , v t ), with V{ G A, there exists a column 
q such that 

(vi, v 2 ,..., v t ) = (r Pl) q, r P2tq , r Puq ). 



1 



Extensive surveys of covering arrays may be found in the papers of Sloane 
[5] and Colbourn [3]. Given t, m and the alphabet size \A\, we wish to find 
the minimum number of columns, n, such that there exists an m x n matrix 
that is t-covering. We will define N = N(m, t, a) as the smallest positive 
integer n such that there exists a covering array of dimensions m x n. At 
the Coimbra Zero-One Matrix Conference, the second author talked about 
the need to introduce new probability models to improve upper bounds on 
N(m,t,a) and the corresponding numbers for partial covering arrays [2]. In 
this paper we propose a specific way of doing so, once again using the Lovasz 
local lemma as an auxiliary tool. 

Lemma 1 The Lovasz Local Lemma (fl\l): Let Ci, C2, ■ ■ ■ , Ck be the events 
in arbitrary probability space. Suppose that each event Ci is mutually inde- 
pendent of a set of all the other events Ck but at most d, and that P(Ci) < p 
for all\<i<K. If ep(d + 1) < 1 then P(f|f=i C' k ) > 0. 

Let R be the index set of all sets of t rows; \R\ = (™) . For r £ R, let C r be 
the event that the r th row set does not contain some vector (v 1, t>2, . . . , Vt) in 
any of its columns. We wish to prove that P(f] r&R C' r ) > if n > N Q , proving 
that N(m, t, a) < N Q . Now in [1] a general upper bound was provided on 
the size of covering arrays; this was 

N(m,t,a) < N := (t - l)-^^-{l + o(l)}. (1) 

The proof used an elementary probability model that consisted of placing 
one letter of the alphabet independently in each of the mn positions with 
probability -, i.e. by letting Pfcj = 1) = - Vx 6 A. In the same paper, 
a special probability model was used, but only for the case a — 2,t — 3. 
Here the authors of [1], following the approach used in the doctoral thesis of 
Roux (see, e.g. [5]), used a probability model that independently places an 
equal number of zeros and ones in the rows of the matrix (the so-called "fixed 
weight rows" model.) Unfortunately this method becomes quite intractable 
in general, and it is our intent in this paper to explore a probability model 
that is, in some sense, intermediate between the general technique in [1] 
and the special method used there for a = 2,t = 3: Specifically, we seek 
to improve the general bound (1) using the method of placing consecutive 
and equally weighted tiles along the rows. We use tiles of dimension lx to, 
such that there are exactly k x's in each tile for each x £ A. By way of 
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comparison, the general method used lxl tiles that led to a loss of control 
over the numbers of letters of each type in any row, while Roux's method 
used a single long tile in each row, i.e., corresponded to k = n/2 (n even). 

We consider two cases, when (i) k — 1 which yields an elementary equa- 
tion relating N(m,t,a) and the variables m, t and a, and when (ii) k > 1, 
which yields better bounds as k increases, but which generates increasingly 
more complicated solutions. 

(i) We start with the case k = 1, and fill in our matrix using tiles that 
contain one randomly placed copy of each letter of the alphabet, assuming 
that a\n. Note that there are a total of a* possible vectors, and by the 
symmetry of our construction, all are equally likely to occur in the selected 
rows. Thus P(C r ) < \a f where A is the probability that a specific vector 
z* = (zi, Z2, ■ ■ ■ , z t ) is missing in the set r of selected rows. Select an arbitrary 
set of t rows in the matrix. Consider the columns in any vertically aligned 
set of tiles. For each Zi, there is exactly one value in any tile equal to z^, and 
a places it can be; moreover z* cannot occur in more than one column of the 
vertically stacked tiles in the selected rows. Therefore, the probability that 
z* is somewhere in these tiles is a ■ (-) = (-) . Since there are - tiles 
in any row of the m x n matrix, and the composition of these is determined 
independently, we have 



We can improve this bound slightly by using a technique found in j2], where 
the vectors 2, = (i, i, . . . , i); 1 < i < a can be achieved for all sets r by 
including columns consisting of all z's. There are a of these vectors; thus this 
reduces the number of z*'s from a 1 to a 1 — a. We can ignore these vectors 
in our calculation of P(C r ) so long as we remember to add a columns to our 
value N(m,t,a). So (2) may be improved as follows: 




and thus 
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Our next step is to calculate d. For any set r of rows, there will be a 
dependency only on sets r G R such that r n r 7^ 0. We will bound 
the number of such ?Vs by choosing one row from r, and then choosing an 
arbitrary t—1 rows from the m—1 other rows in the matrix. Thus d < ^(^-Ti 1 ) > 
so d + 1 < |r-fj7- Substituting this into the Lovasz local lemma we get 



etm 1 1 , , x / / _ / 1 x ' ' 



if 

(t-l)log 2 (m) / log 2 (a* - a) , log 2 (et) log 2 ((t - 1)!) 



log 2 (( ^J ") ' (t-l)log 2 (m) (i-l)log 2 (m) (t-l)log 2 (m) 
i.e., if 

a(t — 1) log 9 (m) . , , . 
n>^ / at T ' {l + o(l)} m^oo. 

It follows that 

N(m, t, a) < -\ / J?\ ' {I + (4) 

since adding back, into (4), the a columns we removed earlier only changes 
the o(l) term. Notice that the above process gives us both a precise and an 
asymptotic bound for n(m, t,a). Note too that (4) gives an improvement 
over the previous best bound (1) due to the fact that 

"<i-i. 



a 1 1 J a 1 



fii) We now consider the case k > 1; recall that the size of our tiles is 1 x ka. 



First note that the size of the tile does not change d, and thus d+ 1 < j^zjj\ as 
before. We next reconsider P(C r ), and compute it using inclusion exclusion. 
Let 



Ik 



v fc f— 1 ( ak \( afc ~* V 

Z^=lV L ! \ j I \k-i,k,...,k) 
I ak \ * 
\k,k,...,k) 



(5) 
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be the probability that a given vector z* is in a given vertical array of t tiles. 
This yields A = A fc = (1 — 7*.) ^ and hence 

P(a)<(«*-a)(l-7fe)^, 
so that the Lovasz local lemma yields P(nC' r ) > if, 

n etm t ~ ] 



(a f -a)(l-7 fc )*a < 1 



i.e., if 



(t-l)log 2 (m) / log 2 (q* - q) log 2 (et) log 2 ((f - 1)!) 

Tl > -, j-^r- <1+ 77 — — j -. r + 



log 2 



1 



l-7fc 



fc Q: 



(t - 1) log 2 (m) (t - 1) log 2 (m) it - 1) log 2 (m) J ' 



or 



It follows that 



A;q(t — 1) logo(m) r _ . 
n> — ' B2 } ' {l + o(l)} m^oo. 

iVK^q) ^^- 1 ; 10 ^ { l + o(l)}. (6) 



Comments It is clear that as we increase k from 1 to -, the bound on 
N(m, t, q) becomes better and better, while the equation to solve for it be- 
comes more and more convoluted. Take for example, the case when t — 3, 
q = 2. The previous best known bound (1) for a general N(m, t, a) yields the 
solution N(m, 3, 2) < 10.38 log 2 (m){l + o(l)}, while the best known solution 
for this specific case (Roux [4]) yields N(m, 3, 2) < 7.56 log 2 (m){l + o(l)}, 
a result obtained by equally weighing all the rows to have the same num- 
ber of l's and O's. The solution obtained via tiling yields N(m, 3, 2) < 
9.641og 2 (m){l + o(l)} when k — 1. With this we can see that even the sim- 
plest case of the tiling solution, k — 1, offers a fairly significant improvement 
in the bounds, while more complex solution will provide the better bounds 
for the size of a covering array. A few values of N(m, t, a) as given by (4) 
and (6) may be found in the following table; k = refers to the bound in 

(1): 
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2 Open Problems 



Perhaps the overarching open problem is that of using alternative probability 
models in order to tease out better and better bounds on the size of minimal 
covering arrays. Markov models and others involving global dependence are 
one option. A method more relevant to the central problem addressed at the 
Coimbra conference, would, however, be to work with zero-one or alphabet 
based matrices with fixed row and column totals (in this paper we fix just 
the row totals!). Last but not least, can we let k go to infinity (at a relatively 
slow rate) and analyze the sum in (5)? Can we conduct the analysis with 
k = n/a? 
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